Solution for the properties of a clustered network 
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We study Strauss's model of a network with clustering and present an analytic mean-field solution 
which is exact in the limit of large network size. Previous computer simulations have revealed 
a degenerate region in the model's parameter space in which triangles of adjacent edges clump 
together to form unrealistically dense subgraphs, and perturbation calculations have been found to 
break down in this region at all orders. Our solution shows that this region corresponds to a classic 
symmetry-broken phase and that the onset of the degeneracy corresponds to a first-order phase 
transition in the density of the network. 



I. INTRODUCTION 

The last few years have seen a surge of interest within 
the scientific community in the properties of networks of 
various kinds Q, |^ |^ . In parallel with empirical stud- 
ies of real- world networks such as the Internet Q, the 
worldwide web [1,01) biological networks 0,0, and social 
networks Q , researchers have developed theoretical mod- 
els and mathematical tools to explain the rich structure 
and nontrivial characteristics that large-scale networks 
exhibit. 

The most fundamental of network models may be the 
Bernoulli random graph |0 (also sometimes called the 
Erdos-Renyi model after two well-known mathemati- 
cians who were among the first to study it In this 
model, n identical vertices are joined together in pairs 
by edges, each possible edge appearing with independent 
probability p for a total of {^)p edges on average. This 
model can be thought of as a special case of the much 
larger class of exponential random graphs, which is the 
class of ensembles of graphs that maximize ensemble en- 
tropy under a given set of constraints (usually imposed 
by observations of the properties of an actual network 
in the real world) p^ . The appropriate constraint for 
the Bernoulli random graph is a constraint on the total 
number of edges in the graph. 

The exponential random graph model defines a proba- 
bility distribution over a specified set of possible graphs 
such that the probability P{G) of a particular graph G 
is proportional to e~^'^'^\ where 



(1) 



H{G) is called the graph Hamiltionian, {mi} is the set 
of observables upon which the relevant constraints act, 
and {di} is a set of real- valued conjugate fields which we 
can vary so as to match the properties of the model to 
the real-world network under consideration. Exact or ap- 
proximate solutions of average properties of the ensemble 
are possible for a variety of graph Hamiltonians, includ- 
ing graphs with arbitrary degree distributions, directed- 
graph models with reciprocity 11^, the so-called 2-star 
model 0, and others __14J. 

In this paper we give a solution of a particular famous 
exponential random graph model, the clustering model 



of Strauss dq|. This model mimics the phenomenon of 
network transitivity or clustering, which has been much 
discussed in the networks literature ^ JJ^, JJJ . The 
model was originally proposed in 1981 and has recently 
attracted the attention of the physics community [l^l20| . 
where the question of how properly to model transitivity 
has proved a persistent stumbling block for theorists. 



II. STRAUSS'S MODEL OF CLUSTERING 

Strauss's model is simple to define. The appropriate 
graph observables are the number of edges m{G) and the 
number of triangles t{G), so that the Hamiltonian can be 
written 



H{G) = em{G)-at{G) 



a J2 

i<j<k 



(2) 



where CTy — aji is an element of the adjacency matrix 
having value 1 if an edge exists between vertices i and j 
and otherwise. When a > 0, this Hamiltonian en- 
courages the formation of triangles in the network by 
assigning lower "energy" to graphs with many triangles. 

Although the Hamiltonian seems simple enough, 
Strauss found via numerical simulations that the model 
sometimes behaved strangely, developing in certain pa- 
rameter regimes a "degenerate state," a condensed phase 
in which many triangles form but tend to stick together 
in local regions of the graph, rather than spreading uni- 
formly over it. Recently Burda et al. [19ii] have performed 
a perturbation theoretic analysis of the model > find- 
ing that the formation of this condensed phase corre- 
sponds to a point at which the perturbation series breaks 
down at all orders simultaneously. The nature of this 
point and of the condensed phase however has not been 
well understood and a complete solution of the model 
has been lacking. In the next section, we present a solu- 
tion of the model based on a mean-field approach which 
we believe to be exact for all parameter values in the 
limit of large system size. Using this solution, we show 
that the model possess a classic second-order phase tran- 
sition between a high-symmetry regime and a symmetry- 
broken one, with a line of first-order transitions between 
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states of high and low density in the symmetry-broken 
regime. The formation of the "condensed phase" ob- 
served by Strauss corresponds precisely to the first-order 
transition from low to high density. 



III. ANALYSIS 



A. Mean-field solution 



Let Hij be the sum of all terms in the Hamiltonian, 



Eq. (0, that involve 



Hij — 9(Tij — a o'ijO'jkO'ki 



a y2 crjkcrki)i 



and let H' be the remaining terms, so that H = Hij - 
The mean value (aij ) of aij can then be written as 



(3) 



{(Ji-j) = X P(cry = 0) + 1 X P(cr, 



I) 



1 



-H{G) 



1 



(4) 



where Z = is the partition function. Here 

(. . .) indicates the average within the ensemble, and the 
derivation so far has been exact. 

By analogy with spin models, let us call the expres- 
sion within the brackets in Eq. Q the local field coupled 
to spin aij. The mean- field approximation involves re- 
placing the spin variables in the local field with their 
ensemble averages, which in this case means ajkCki 
q = (ajkCTki)- Defining also the connectance p = (a 
we now have 
1 



P 



e«-a(n-2)7TT = Ml-tanh(i0-ia(n-2)q)]. (5) 

Now we set up an equation for q via a similar method. 
Noting that atkcrkj — 1 only when both aik = 1 and 
akj = 1, we can write: 

q = {crikcrkj) = 



1 (e" - l)p 



(e^-"("-3)'/ -f f ) + (e" - l)p 



(6) 



where in the final line we have made the mean-field ap- 
proximation again, and made use of the property that 
gQCTij = 1 + (c" — l)<Tij, since aij = or 1. 

We now have two equations in two unknowns which 
can be solved by substituting into to give a self- 
consistency condition on q: 

g6l-a(n-2)g _j_ 
9 = (ee-a(n-3)g ^ l)2(ee-a("-2)g _^ 1) + (g" - 1) 

^ Qil)- (7) 
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FIG. 1: (Color online) Graphical solutions of g = Q{l)- De- 
pending on the values of the parameters 9 and a, the line 
y = q (dashed) intersects with y = Q{q) (sohd) either three 
times or only once. The parameters {6, na) for the two curves 
shown are (2.3,6.0) and (0.5,2.0). 



In Fig. ^ we show a plot of the forms y — q and 
y = Qil) functions of q. The intersections of the 
two curves give the solutions of Eq. {Tj) . As we can see, 
depending on the values of 6 and a, the curves can inter- 
sect at either one or three points in the allowed domain 
< g < 1. The regime in which there arc three solutions 
corresponds to a symmetric-broken phase with only the 
outer two solutions being stable (corresponding to min- 
ima of the free energy). Thus, the system displays the 
classic phenomenology of a second-order phase transition, 
with a critical point separating a high-symmetry phase 
from a symmetry-broken one having regimes of high- and 
low-density and an intermediate region of coexistence of 
the two. In Fig. El we show the phase diagram of the 
system. 

Finally we introduce another mean-field equation for 
r = (aijajkaki) which gives the number of triangles in 
the network: 



{c^ijC^jk^ki 



(e»""("-2)9 + l)V(c"-l) 



(8) 



In Fig. 13 we compare our solutions for p, q and r with 
simulation results for a system of size n — 500 and, as we 
can see, the agreement between theory and simulation is 
excellent. 



B. The approximation 

As mentioned in the introduction, we believe that the 
mean-field solution found in the previous section is ex- 
act, because in the limit of large system size the system 
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FIG. 2: The phase diagram in the {na, 9) space. The shaded 
area corresponds to the coexistence region in which the system 
can be in either of two stable states, one of high density and 
one of low. 



becomes fundamentally infinite-dimensional, and mean- 
field theory is usually exact in the large dimension limit. 

In fact, Eq. Q really makes two approximations. One 
is the mean-field approximation ajkCki ^ {o'jkO'ki), but 
we have also assumed that the average of the tanh can 
be approximated by the tanh of the average. While the 
first approximation can be justified on the basis of the 
high effective dimension of the system, the second needs 
more attention. It can be justified by performing a series 
expansion of the tanh, applying the mean-field approxi- 
mation to the series term by term, and then resumming 
the result again pH |2^ . However, while this method 
works, it is not as simple as our brief description makes 
it sound, because the series involves averages over arbi- 
trarily high moments of the graph operators and proving 
that these terms are negligible requires some care. 

Let us rewrite Eq. thus: 

p = ((Ty) = i(l + (tanh(-i0 + ^ Sk))) 

= i(l + (tanh(^ + 55))), (9) 

where for convenience we have defined Q = —^0, a = ia, 
Sk = cTikO-kj, and S = J2k=£i,j ^k- Expanding the tanh 
about 6, we get 

+ .s,).£i5H^<s.»), (10, 

TO=0 

Now, keeping in mind that S"^ = crj^jcr^ = (Tik(Jkj = 5*^ 
for any m, since aij = 0, 1, we can write the correlation 
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FIG. 3: (Color online) Comparison of our analytic solution 
(solid lines) and Monte Carlo simulation results (circles) for 
P = i'^ij)^ 1 = {(^jkCfki), and r = {Gija-jk^ki), for a system of 
n = 500 vertices. The parameter values were (a) 6 = 2.2 and 
(b) e = 0.53. (See Fig.H) 

functions (S"™) in the form 

= {{T.kSu){T.kSu)...{T.kSk)) (lla) 
= am.i{Si -I- 5*2 + . . .) + a,„,2(<5'i5'2 + SiS^ + . . .) 

+ a,„,3(^i^2^3 + •••) + ••• (lib) 
= a™,i(n - 2)q + a,„,2("2^)(z2 + a„,,3("32)g3 + . . . 

(11c) 

In the last line we have made the assumption that a 
product {S1S2) can be approximated as (S'i)(52) ~ q^, 
and similarly for higher products. This approximation 
is of the nature of a mean-field approximation, ignoring 
correlations between single pairs of spin variables, which 
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will be of order 1 /n in the large system size limit where 
each variable interacts with an arbitrary number of oth- 
ers. 

The coefficient am,,i in Eq. Hll|l is the number of ways 
of selecting one Sk from each of the m sums in ()lla|) so 
that there are I unique indices in the resulting product. 
It is simple to show that X]i=i = and thus 

by induction to prove that the exponential generating 
function for a^,/ satisfies 



m 



(12) 



Then, by repeated differentiation 
'd 



am, I 



(e" - 1)' 



(13) 



z=0 



Combining this result with Eq. and taking the limit 
of large n, we find 



'd 



d 



■E 



(e^ - l)'(n-2)'g' 



dz"^- 



Jn-2)g(c--l) 



(14) 



The differentiation in the last line can be carried out 
explicitly for any given value of m, but there is no sim- 
ple closed-form expression for the general case. However, 
none is needed in the large n limit. Each successive dif- 
ferentiation with respect to z generates an extra factor of 
(n — 2)q. But the graphs we are interested in are dense, 
meaning that p, q, and r all tend to finite, non-zero lim- 
iting values as n — > oo. Thus {n — 2)q is a large quantity 
and to leading order we need only retain the highest-order 
term in the derivative, which is simply [(n — 2)^] . Thus 
Eq. l(Tn|) becomes 



IV. DISCUSSION 

What does our solution of the Strauss model tell us? 
To begin with, it tells us the precise nature of and reason 
for the "degenerate state" observed by Strauss in sim- 
ulation studies and by Burda et al. [l^l in their pertur- 
bative calculations. Strauss's observations were correct — 
something special does happen to the model in the degen- 
erate region. In fact, there is a first-order phase transition 
driven by the "field" parameter coupled to the number 
of edges in the graph. This also explains the breakdown 
of the perturbation expansion at this point, since such 
expansions typically break down at first-order transitions 
because of the corresponding pole in the free energy. The 
degenerate phase of the model is a high-density phase in 
which there is a large number of triangles in the graph, 
forming what appears to be almost a complete graph: the 
connectance of the network is close to 1 in this regime 
(Fig.©. 

More importantly, the first-order nature of the transi- 
tion means there is a discontinuous jump in the density of 
triangles as we enter the degenerate state and thus there 
is no intermediate set of parameter values that will give 
the graph a moderate density of triangles as seen in real- 
world networks. While Strauss's model seems the most 
natural form for an exponential random graph model of 
transitivity, our results imply that it will in fact never 
be a good model of real-world networks with moderate 
clustering. One can of course reduce the value of the 
parameter a until we pass through the critical point so 
that the first-order transition disappears, in which case 
we recover smooth variation of the density of the graph 
with 9, but then the graph no longer has any significant 
clustering because of the small value of a. 

These observations do not necessarily imply that ex- 
ponential random graphs are incapable of mimicking net- 
works with clustering; indeed they may present our best 
current hope for making clustered network models. Our 
results imply however that Strauss's original model with 
a single term in the Hamiltonian to encourage triangles 
must, at the very least, be augmented in some way in 
order to achieve this aim. 



^ tanh(™)(0~)5'",, .m 

= 2^ j (("-2)g) 

^ — ' to! 

m=0 

= tanh{e + a{n - 2)q) (15) 

and 

p = i [1 + tanh(0 + 5(n - 2)g)] , (16) 

which is identical with Eq. 101. A similar derivation can 
be performed for Eq. © , and hence the entire mean- field 
solution is exact in the limit of large system size. 



V. CONCLUSIONS 

In this paper we have given a mean-field solution of 
Strauss's model of a network with clustering. Because 
of the intrinsically high-dimensional nature of networks, 
we believe this solution to be exact in the limit of large 
system size, which is the main case one is normally in- 
terested in. We have also performed Monte Carlo sim- 
ulations of the model that confirm our solution to high 
accuracy. Our solution indicates that the model has no 
regime in which it displays moderate levels of clustering 
similar to those seen in real- world networks; presumably 
it will be necessary to introduce further terms into the 
Hamiltonian to avoid this pathology. 
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We believe exponential random graphs offer one of the 
most flexible tools for the modeling of general networks, 
and look forward to further developments. We hope that 
the formalism introduced here will serve as a practical 
starting point for a variety of problems. 
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